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Abstract 



We consider the problem of estimating an input signal from noisy measurements in both parallel scalar Gaussian 
channels and linear mixing systems. The performance of the estimation process is quantified by the l^ norm error 
metric. We first study the minimum mean £oa error estimator in parallel scalar Gaussian channels, and verify that, 
when the input is independent and identically distributed (i.i.d.) mixture Gaussian, the Wiener filter is asymptotically 
optimal with probability 1. For linear mixing systems with i.i.d. sparse Gaussian or mixture Gaussian inputs, under 
the assumption that the relaxed belief propagation (BP) algorithm matches Tanaka's fixed point equation, applying the 
Wiener filter to the output of relaxed BP is also asymptotically optimal with probability 1. However, in order to solve 
the practical problem where the signal dimension is finite, we apply an estimation algorithm that has been proposed in 
our previous work, and illustrate that an l^ error minimizer can be approximated by an £ p error minimizer provided 
i i the value of p is properly chosen. 

^ Index Terms 

o 

Belief propagation, estimation theory, £oo norm error, linear mixing systems, parallel scalar channels. 

I. Introduction 

o 

A. Motivation 

^» The Gaussian distribution is widely used to describe the probability densities of various types of data, owing to 

its mathematical advantages [2], It has been shown that non-Gaussian distributions can often be represented by an 
infinite mixture of Gaussian [3], so that the mathematical advantages of the Gaussian distribution can be preserved 
when discussing non-Gaussian data models [4,5]. 

A set of parallel scalar Gaussian channels with a mixture Gaussian input vector has been used to model image 
denoising problems [4—6], while linear mixing systems are popular models used in many settings such as compressed 
sensing [7,8], regression [9,10], and multiuser detection [11], Signal reconstruction from noisy measurements is 
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prevalent in the literature, but the minimization of the i^ error has received less attention. Our interest in the i^ 
error is motivated by applications such as group testing [12] and trajectory planning in control systems [13], where 
we want to decrease the worst-case sensitivity to noise. 

B. Problem setting 

We describe parallel scalar Gaussian channels and linear mixing systems below. In both settings, the input 
vectors x are independent and identically distributed (i.i.d.) mixture Gaussian, i.e., Xi ~ S\ ■ A/"(0,/ii) + s 2 ■ 

A/"(0, 1x2) H h sk ■ A/"(0, pt K ), where si, S2, . . . , sk > are given, X) fc=1 s k = 1. and Mi> M2, ■ • • , Mat are also 

given. The subscript (-)j denotes the i-th element of the corresponding vector. As a specific case, we study the 
i.i.d. s-sparse Gaussian, i.e., Xi ~ s ■ Af(0, H x ) + (1 — s)S (xi) for some given s and /i x . 

For a set of parallel scalar Gaussian channels [4—6], we consider 

r = x + z, (1) 

where r,x, z <G M. N . The vectors r, x, and z are the received signal, the input signal, and the i.i.d. Gaussian noise, 
respectively. The additive Gaussian noise channel can also be described by the conditional distribution 

/ R ,x(rix) = n/*i*N*o - n 7^ cxp i- 1 ^)' (2) 

where fx z is the variance of the Gaussian noise. 
For a linear system [7,8, 11], 

w = <frx, (3) 

the random linear mixing matrix (or measurement matrix) $ € R MxAr is known and its entries are i.i.d. Because 
each component in the measurement vector w <G M. M is a linear combination of the components in x, we call the 
system (3) a linear mixing system. The measurements w are passed through a bank of separable scalar channels 
characterized by conditional distributions, 

M 

/v|w(y|w) = JJ./Y|w(2/»K), (4) 

1=1 

where y is the channel output vector. However, unlike the parallel scalar Gaussian channels (2), the channels (4) 
for the linear mixing system are general and are not restricted to Gaussian [14, 15]. 

Our goal is to reconstruct the original system input x from the channel output r (1) or from the output y and the 
matrix $ (3). To evaluate how accurate the reconstruction process is, we calculate the error between the original 
signal x and the reconstructed signal x. Many works emphasize the squared error performance [16-18]. In this 
paper, however, we focus on preventing any significant errors during the signal reconstruction process. That is, we 
want to study algorithms that minimize the l^ norm of the error, 

Hx-xlloo = max \Xi - Xi\. 
ie{i,...,JV} 



C. Related work 

In our previous work [19], we dealt with an additive error metric defined as 

N 

D(x, x) = ^2d(xi,Xi). (5) 

We proposed a reconstruction algorithm that is optimal in minimizing the expected value of error metrics of the 
form (5), where the reconstruction process is done component-wise, i.e., for each i e {1,2, ...,7V}, d(Xi,Xi) is 
minimized separately. However, in contrast to (5), the £oo error is not additive, because it only considers the one 
component that has the maximum absolute value, and thus it is not straightforward to extend the algorithm [19] to 
minimize the ^ error. 

There have been a number of studies on general properties of i^ error related solutions. An overdetermined 
linear system y = $x, where <I? e j^nxm an( j ^y > ^ was considered by Cadzow [20], and the properties of the 
minimum £oo error solutions to this system were explored. In Clark [21], the author developed a way to calculate 
the distribution of the greatest element in a finite set of random variables. And in Indyk [22], an algorithm was 
introduced to find the nearest neighbor of a point while i^ norm distance was considered. Finally, Lounici [23] 
studied the i^ error convergence rate for Lasso and Dantzig estimators. 

D. Contributions 

Our first result is asymptotic in nature; we prove that, in parallel scalar Gaussian channels where the input signal 
is i.i.d. sparse Gaussian or i.i.d. mixture Gaussian, the Wiener filter [24] is asymptotically optimal for l^ norm error. 
These results are extended to linear mixing systems based on the assumption that the relaxed BP algorithm [15] 
matches Tanaka's fixed point equation [16,25-29]. We claim that in linear mixing systems, when the input signal 
is i.i.d. sparse Gaussian or i.i.d. mixture Gaussian, applying the Wiener filter to the outputs of the relaxed BP 
algorithm is asymptotically optimal for i x norm error. 

Our second result is practical in nature; in order to deal with signals of finite length N in practice, we apply the £ p 
error minimization by [19], and show numerically that, with a finite signal length TV, the £ p error minimization [19] 
outperforms the Wiener filtering. 

The remainder of the paper is arranged as follows. We review the metric-optimal algorithm along with the relaxed 
BP algorithm in Section II, and then discuss our main results in Section III. Simulation results are given in Section 
IV, and Section V concludes. Proofs of the main results appear in appendices. 

II. Review of the Relaxed Belief Propagation and Metric-Optimal Algorithms 

The set of parallel scalar Gaussian channels (1) has a simple structure, because each channel r» = x% + z% is 
separable from other scalar channels, and thus the analysis on the system model (1) is convenient. The linear mixing 
systems, however, is more complicated. Previous works [15-17,25-30] have shown that a linear mixing system (3) 



®,fr\x 



Relaxed 
BP 



h 



1.V- 



D(x,x) 



fxinWl) 



mjn E[D(x,x)\q] 



Vector estimation Scalar estimation 

Fig. 1: The structure of the metric-optimal estimation algorithm. 



and (4) can be decoupled to parallel scalar Gaussian channels. In this section, we review the decoupling process 
of the linear mixing systems, as well as the metric-optimal algorithm that is based on the decoupling. 

There are different versions of the relaxed belief propagation (BP) algorithm [15, 17,30] in the literature, while 
our proposed algorithm [19] is based on the one by Rangan [15], specifically the software package "GAMP" [31]. 
An important result from the relaxed BP algorithm in linear mixing system problems [15] is that, after a sufficient 
number of iterations, the relaxed BP process calculates a vector q = [qi, q 2 , . . . , qiy] T G R w , and then estimating 
the inputs x from the outputs y of a linear mixing system (3) and (4) is asymptotically statistically equivalent to 
estimating each input entry Xi from the corresponding qi, where qi is regarded as the output of a scalar Gaussian 
channel: 

q % = Xi + Vi (6) 

for i e {1, ...,JV}, where each channel's additive noise Vi is Gaussian distributed Af(0,/i v ), and ii v satisfies 
Tanaka's fixed point equation [16,25-29]. The value of ll v can also be obtained from the relaxed BP process [15]. 
We note in passing that when we discuss qi — Xi + Vi, these are the parallel scalar channels resulting from the 
relaxed BP algorithm, and when we discuss r, = Xi + z i7 these are the true parallel Gaussian channels (1). 

In previous related work [19], we utilized the outputs of the relaxed BP algorithm [15], and introduced a general 
metric -optimal estimation algorithm that deals with arbitrary error metrics. Figure 1 illustrates the structure of our 
metric -optimal algorithm (dashed box). The algorithm is essentially a scalar estimation process, whereas the relaxed 
BP algorithm deals with the vector estimation. 

We first compute the conditional probability density function ,/x|Q( x |q) from Bayes' rule: 

/q|x(q|x)/ x (x) 



/x|q(x|q) = 



(7) 



J7q|x(q|x)/x(x)dx' 

Then, given an additive error metric D(x, x), the optimal estimand x opt is generated by minimizing the conditional 
expectation of the error metric E[D(x, x)|q]: 



x opt = argmin£;[D(x,x)|q] 



argmin / D(x,x)/ X |Q(x|q)dx. 



(8) 



In the large system limit, the estimand satisfying (8) is asymptotically optimal, because it minimizes the conditional 
expectation of the error metric. 



Because both the error metric function D(x., x) and the conditional probability function /xiq^Iq) ^ e separable, 
the problem reduces to scalar estimation [32]. The estimand x opt is solved in a component-wise fashion: 

x op t.i = argmin E[D(xi,Xi)\qi] 

Xi 

= argmin / D(xi,Xi)f(xi\qi)dXi, (9) 



for each x%, i G {1, 2, 3, ..., N}. This scalar estimation is easy and fast to implement. 

Owing to the fact that £oo error only considers the component with greatest absolute value, and does not have 

an additive form (5), it is natural to turn to £ p norm error as an alternative. 

Recall that the definition of the £ p norm error between x and x is 

i/p 

yie{l,...,N} 

This type of error is closely related to our definition of the error metric (5). We define 

N 

D p (x,x) =^2\xi- Xi\ p = ||x - x||p, 
and let x p denote the estimand that minimizes the conditional expectation of D p (x,x), i.e., 

x p = argmini?[Z)p(x, x)|q] 

X 

= argmjn^Hx-xllJlq], (10) 

X 

and 

x P:i = aj:gmmE[\xi - Xi\ p \qi], (11) 

Xi 

foiie {1,2,..., Af}. 

Although x p is minimizing the (£ p ) p error, rather than the £ p error, we call x p the minimum mean £ p norm error 
estimator for simplicity. 

Because it can be shown that 

lim ||x-x|L = llx-xlU, (12) 



toe- 



it is reasonable to expect that if we set p to a large value, then running our metric-optimal algorithm with error 
metric D p {-) (10) will give a solution that converges to an estimand that minimizes the £ ao error. 

III. Main Results 

In this section, we first study the minimum mean £ x error estimator for parallel scalar Gaussian channels (1), 
then discuss the minimum mean ^ error estimator for linear mixing systems (3) and (4), and finally analyze the 
performance of the minimum mean £ p norm error estimators (10) in terms of the ^ norm error. 



A. The minimum mean i^ estimator for parallel scalar Gaussian channels 

For a set of parallel scalar Gaussian channels (1), the minimum mean squared error estimator, i.e., p = 2 in (10) 
(here we replace q by r for scalar Gaussian channel discussion), is achieved by the conditional expectation S[x|r]. 
To build intuition into the problem of finding £[x|r], we first suppose for simplicity that x is Gaussian (not mixture 
Gaussian), i.e., x ~ Af(0,/j, x ■ In) and z ~ Af(0,/j, z ■ In), where In is the N x N identity matrix, then the 
estimand x? = Mxlrl = — U? — r gives the minimum mean squared error. This format — ^f — r is called the Wiener 
filter in signal processing [24]. It has been shown by Sherman [33] that, when the signal input vector and the 
parallel Gaussian channels are both Gaussian, the linear Wiener filter is also optimal for all l p norm errors (p > 1), 
including the 1^ norm error. Surprisingly, we find that, if the signal input is i.i.d. sparse Gaussian or i.i.d. mixture 
Gaussian, the Wiener filter asymptotically minimizes the t^ error. Our main results follow. 

Theorem 1. In a set of parallel scalar Gaussian channels (1), if the input signal x is i.i.d. sparse Gaussian, 
i.e., Xi ~ s • Af(0, /J, x ) + (1 — s)6o(Xi), then the Wiener filter 

Hx 
X W,SG — . r 

Hx + Hz 

is asymptotically optimal for i x error with probability 1. More specifically, 

Pi\e lim Hx-x^sgHoo r < E lim Hx-xHoo r \ = 1, 
where x is any arbitrary estimand. 

Theorem 1 is proved in Appendix A. The main idea of the proof is to show that asymptotically the maximum 
absolute error lies in index i, ||x — x^ = \xi — Xi\, where i is such that Xi is nonzero. Therefore, minimizing 
the maximum absolute error between the estimand x and the entire vector x = [xi,X2,... ,%n] T is equivalent 
to minimizing the maximum absolute error between the estimand and the subvector x = [xi lt xi 2 , . . . ,xi d ] T , 
where xi i 's are nonzero and Gaussian distributed and d represents the number of nonzero elements in x. Thus for 
an i.i.d. Gaussian vector x, the Wiener filter minimizes the l^ error [33]. 

Theorem 1 only applies to s-sparse Gaussian signals, but can be easily extended to the mixture Gaussian 
distribution, thus significantly enhancing the applicability of our result. In the mixture Gaussian input case, the 
maximum absolute error between x and the estimand x lies in the index i that corresponds to the Gaussian mixture 
component with greatest variance. 

Theorem 2. In a set of parallel scalar Gaussian channels (1), if the input signal x is i.i.d. mixture Gaussian, 
i.e., Xi ~ si ■ jV(0, Hi) + S2 • A/"(0, ^2) + • • • + skN{0, Hk), where si, S2, ■ ■ ■ ,Sk > and X)fe=i s fe = 1> tnen tne 
Wiener filter 

^ _ Hmax 

X W,MC — — r 

Hmax 1 Hz 



is asymptotically optimal for t^ error with probability 1, where [i max = maxj.^!^....^) P>k- More specifically, 
Pt\e lim ||x-x W mc||oo r <E lim Hx-x^ r \ = 1, 

L IN— >co J LW- >oo J J 

where x w any arbitrary estimand. 

The proof of Theorem 2 is given in Appendix B. 

B. The minimum mean £ x error estimator for linear mixing systems 

We discussed in Section II that, using the statistical information of the linear mixing system (3) and (4), the 
relaxed BP algorithm asymptotically computes a set of equivalent parallel scalar Gaussian channels. Therefore, using 
the output of the relaxed BP algorithm, i.e. the scalar Gaussian channels output vector q and the noise variance fj, v , 
and then applying the Wiener filter, we will obtain the estimand that is asymptotically optimal in the l^ error sense 
for the linear mixing system (3) and (4). Because the analysis of the equivalent scalar Gaussian channels (6) relies 
on the replica method [16], which has only been rigorously justified in specific setting [34], we state our result 
below as claim. 

Claim 1. Given the system model described by (3) and (4), where the input signal x is Ltd. mixture Gaussian (sparse 
Gaussian is a specific case) distributed, Xi ~ Si\A/"(0, fii)+S2-J\f(0, ^2) + ' ' , +Sif-A/'(0, Hk), where Si, S2, • • • ,8k > 
and X)fc=i s k — 1> as tne signal dimension N — > 00 and the measurement ratio M/N is fixed, the estimand 

^ _ l^max 

x-wmc-bp — — — q 

is asymptotically optimal for l^ error with probability 1, where q and fx v (6) are the outputs of the relaxed BP 
algorithm, and fj, max = max fee { 12 ,...,K} fJ-k- 

The relaxed BP algorithm [15] always decouples the linear mixing system (3) and (4) to parallel scalar Gaussian 
channels, regardless of what type of channel (4) describes the system. This feature allows more flexibility of channel 
types in linear mixing systems (4) than in scalar Gaussian channels (2). 

C. The approximation of the minimum mean l^ error estimator 

The Wiener filter is asymptotically optimal for i^ error, and one may wonder whether the performance of the 
Wiener filter is satisfactory for a finite signal length N. Readers will see in Appendix A that the Wiener filter is 
asymptotically optimal with a convergence rate on the order of ^1n(N), which suggests that the convergence rate 
is slow. Therefore, we are motivated to compare the performance of the Wiener filter with the minimum mean £ p 
norm error estimators (10) in terms of the i^ norm error. Indeed, the numerical results in Section IV indicate that 
the minimum mean £ p error estimator achieves a lower i^ error than the Wiener filter, provided the value of p is 
properly chosen. Keeping (12) in mind, one would expect that, for any positive integers p\, P2, where p\ > P2, 
x pi always achieves a lower ^ error than x P2 does. However, experiments have indicated that, for a fixed signal 



dimension N, minx -E[||x p ^ x llco|q] can be achieved by a finite p. We include numerical results in Section IV, 
and state our conjecture here. 

Conjecture 1. Given that a system is modeled by (1) or (3) and (4), where the input x is sparse Gaussian or 
mixture Gaussian and x p is obtained by (10), then for any fixed signal dimension N > 0, there exists an integer 
p pt such that 2?[||Xp — x||oo|q] < -E[||x p — x||oo|q] for all positive integers p. Moreover, as the signal dimension 
N increases, the value of p opt increases. 

Remark. In this paper, our focus is on i.i.d. mixture Gaussian input distributions. However, additional numerical 
results in Section IV show that Conjecture 1 also applies to other types of input distributions. 

Conjecture 1 indicates that for a fixed signal dimension, the minimum mean £ p norm error estimators (10) with 
different values of p reduce the t^ error to different amounts, and the optimal value of p for a bounded signal 
dimension is also bounded. The conjecture also points out implicitly that p opt is a function of the signal dimension 
N. An intuitive explanation to Conjecture 1 is that as the signal dimension increases, the probability that larger 
errors occur also increases, and thus a larger p in (10) is used to suppress larger outliers. 

IV. Numerical Results 

In this section, we provide the simulation results that inspired our results in Section III-C. Again, we first present 
the simulation results for parallel scalar channels, and then for linear mixing systems. 

A. Parallel scalar channels 

We first test for the parallel scalar Gaussian channels r = x + z (1), where the input x is i.i.d. mixture Gaus- 
sian, Xi ~ 0.2-7V(0,10)+0.3-TV(0, l)+0.5-7V(0, 0.5), and the noise is z t ~ JV(0,0.1). For this mixture Gaussian 
signal, there are three Wiener filters corresponding to three different input variances: xwi = 10 i° 1 r, Xw2 = TTTTT 1 "' 
and x W 3 = 5 ° + 5 q i r. In Figure 2, we compare the ioo error of Xwi, x W 2, and x W 3. It can be seen that Xwi, which 
corresponds to the Gaussian input component with largest variance, achieves the lowest l^ error among the three 
Wiener filters. This result verifies Theorem 2. 

We then test for a set of parallel scalar Gaussian channels where the input is i.i.d. sparse Gaussian. The sparsity 
rate is s = 5%, and the nonzero input elements are i.i.d. Af(0, 1) distributed, i.e., Xi ~ A/"(0, 1) if x, 7^ 0, while the 
Gaussian noise is i.i.d., z% ~ A/"(0, 5 x 10~ 4 ); note that the signal to noise ratio (SNR) is 20dB. Here the Wiener 
filter is r/(l + 5 x 10 -4 ). We also obtain the minimum mean £5, £10, and £15 error estimators - X5, X10, and X15 
- using equation (10), where we replace q by r. 

Figure 3 compares the t^ error achieved by the Wiener filter and the minimum mean £5, £10, and £15 estimators. 
The results in Figure 3 are consistent with our Conjecture 1. When N < 300, X5 has the lowest £oo error among 
all four estimators; when 300 < N < 3,000, xi achieves the smallest i^ error; and when N > 3,000, X15 
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Fig. 2: The performance of the three Wiener filters corresponding to three different Gaussian components in parallel scalar 
Gaussian channels. The input vector is i.i.d., Xi ~ 0.2 ■ A/"(0, 10) + 0.3 ■ A/"(0, 1) + 0.5 ■ A/"(0, 0.5). The Wiener filter x W i 
that corresponds to the first mixture component 7V(0, 10) achieves the lowest l^ error. 

outperforms. Figure 3 also shows that the slope of the "Wiener filter" line is smaller than the slopes of "xg", "xio"> 
and "xig", which suggests that the Wiener filter is asymptotically optimal for i^ error. 



B. Linear mixing system 

We perform simulations for linear mixing systems (3) and (4) using the software package "GAMP" [31] and our 
metric -optimal algorithm [35]. Our metric-optimal algorithm package [35] automatically computes equations (7)-(9) 
where the distortion function (5) is given as the input of the algorithm. 

In all the following simulations, the input signals are sparse with sparsity rate 5%, and the measurement matrices $ 
are Bernoulli(0.5) and are normalized to have unit-norm rows. We have three different combinations for input 
distributions (3) and channel distributions (4), whereas in all channels the SNR is 20dB: 

1) The nonzero input entries are Gaussian A/"(0, 1), and the channel is Gaussian. 

2) The nonzero input entries are Weibull distributed, 



f(x l ;X,k) 



*(<^) fc -V(**A)* Xi > 

x, < 

where A = 1 and k = 0.5, and the channel is Gaussian. 
3) The nonzero input entries are Weibull distributed (13), and the channel is Poisson, 

(ttio,) 5 ^-!™') 



(13) 



lY\w{yi\wi) 

where the scaling factor of the input is a = 100. 
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scalar Gaussian channels. The optimalp opt increases as N increases. (Sparse Gaussian input, sparsity rate is 5%, and SNR is 20dB.) 



For the system dimension, we fix the ratio M/N = 0.3, and let N range from 500 to 20, 000. Then we run the 
Wiener filter (only in the case of sparse Gaussian input and Gaussian channel, because the Wiener filter does not 
apply to sparse Weibull distributed inputs), relaxed BP [15, 31], and our metric-optimal algorithm with p = 5, 10, 15 
in (10). 

To compare the performance of the Wiener filter with x p (10), and also to illustrate how p opi is related to the 
signal dimension N, we present in Figure 4 the £oo norm error of the Wiener filter, and the minimum mean £5, 
£10, and £15 norm error estimators, i.e., X5, Xin, and X15 (10). The numerical results shown in Figure 4 are similar 
to the results shown in Figure 3, and are also consistent with Conjecture 1. 

When the input is sparse Weibull, the Wiener filter does not apply, because the Wiener filter is designed 
specifically for a Gaussian input and Gaussian channel. Instead, we compare the l^ errors of the relaxed BP, 
X5, Xin, and X15 (10), and the results are shown in Figures 5 and 6. We can see that all the minimum mean £ p 
(p = 5, 10, 15) error estimators perform better than the relaxed BP algorithm for i^ error. Also, both figures suggest 
that the optimal p opt increases as N increases, and thus the correctness of Conjecture 1 is not limited to sparse 
Gaussian signals. 

V. Conclusion 

In this paper, we studied the minimum mean l^ error estimator for both parallel scalar Gaussian channels and 
linear mixing systems. We showed that in both systems, when the input signal is i.i.d. sparse Gaussian or i.i.d. 
mixture Gaussian, the Wiener filter is asymptotically optimal for minimizing the t^ error with probability 1. On 
the other hand, when the signal dimension N is finite, our previously proposed metric-optimal algorithm with a 
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proper £ p error metric outperforms the Wiener filter. A possible direction for the future work would be to find a 
more general form of input signal in the parallel scalar Gaussian channel setting where a linear filter is optimal 



for 



norm error. 
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Appendix A 
Proof of Theorem 1 

In order to show that the Wiener filter is optimal for £oo norm error in a Gaussian channel with sparse Gaussian 
input, we show that the maximum absolute error caused by nonzero input elements is larger than that caused by 
zero elements with overwhelming probability that converges to 1, 

Consider a set of parallel Gaussian channels (1), where the input signal is s-sparse: Xi ~ s ■ Af(0, ti x ) + (1 — 
s) ■ 8o(xi), and Zi ~ A/"(0, (j, z ). The Wiener filter (linear estimator) for sparse Gaussian input is xw.sg = c • r, and 
2^w,SG,i = c • r.j, where c = ii x /([i x + fi z ) > 0. Let 1 denote the index set where Xi 7^ 0, i.e., I = {i : X{ 7^ 0}, and 
let J denote the index that J = {j ' : Xj — 0}. We define two types of error patterns: (i) Xi 7^ 0, i <G 1, and the 



error e; 



C • T% X>l C\Xi "T Z\ J 



Xi 



czi - (1 - c)x t ~ 7V(0, c 2 /i z + (1 - c) 2 ri x )\ (ii) 



0,jej, and the 



error e, ■ = c • r. 



— Xj = CZj ~ A/"(0, C 2 jJL z ). 
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Fig. 5: The performance of the relaxed BP algorithm and the minimum mean £5, £10, and £15 estimators in terms of l^ error in 
the linear mixing system. The optimal p opt increases as N increases. (Sparse Weibull input and Gaussian channel, sparsity rate is 
5%, and SNR=20dB.) 

It has been shown [36] that for a sequence of standard Gaussian independent random variables {Xi}i, the 
following equality holds: 

Pr ( lim (21n(7V)r 1/2 max x t = 1 ) = 1. (14) 

Therefore, for 7V(0, a 2 ) distributed Gaussian variable {2^}^, the equality (14) becomes 

-1/2 r ,, :!V 1 _\ : 1 ] = J. 



or 



Prl lim (21n(iV))- 1 ^ max (—) 

\N-too Ki<N \ a 1 



p r | Um ^mm =11=1. 



(15) 



k n^oo cr(21n(7V)) 1 /2 

For ei ~ 7V(0, (1 - c) 2 ri x + c 2 [i z ) = Af(0, a 2 ), and e 3 ~ 7V(0, c 2 rt z ) = 7V(0, o\), we get from (15) that 

max ieX e t 



Pr < lim 



w^oo o- ly /2ln{N{s - e)) 



= 1 



r, \1\ = N(s - e) } = 1, 



Pr < lim 



1 



r,\J\=N(l-s + e) 



1. 



JV^-oo o- 2> /21n(iV(l-s + e)) 
where \I\ and |7| denote the number of elements in the set 1 and J, and e > is arbitrarily small. This indicates 
that 



Pr lim 



max ie xei a 2 



Vln(JV)+ln(l-,s + e)) =i 



^ n^oo max je j e, CTl y/\n(N) + ln(s - e)) 
where the event A e is defined as 



r,A f =1, 



A e = {|I| = 7V( S - e), |,7| = N(l - s + e)}, 
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for some e > 0. Note that the event A e is independent of r. Because the sparsity rate ,s is a constant, and e > is 

arbitrarily small, then 

nm Vln(JV) + ln(l - s + e) = i 



Therefore, 



W->oo ^ln(N) + ln(s - e) 

p [ i. max^gx &i g 2 \/ln(JV) + ln(l - s + e)) _ 
I iV-s-oo max^j ej Gl \J\tl{N) + ln(s - e)) 



r,A f 



Pr( i im -""^1^ = 1 



jv->oo maxjgj ej o-\ 



v. A, 



= 1. 
Because o\ > tr 2 , we have 



Pr I lim max e^ > lim max < 



r,A f =1 



i.e., 



Pr ( lim max \xi — xA > lim max |x ? — a?,-| 
,w-k» iex n^od jej J J 



r,A e ) =1 
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Recall that we denote the Wiener filter by x W sg 
E 

= E 



lim ||x-x w ,sg||oo 

JV->oo 



r,A e 



lim max \xi — xwsg , 

JV->oo iei 



r, A e , lim max \xi — xwsg il > lim max \xj — xwsg i 
jv-s-oo iei ' iv-»oo jej ' 



Pr lim max \xi — Xwsg il > lim max \xj — S?wsg j 

1 n^qo iei ' n^oo jej ' 



r,A, 



-E 



lim max \xj — Xwsg ?l 
jv->oo jej ' 



r,A e , lim max \xi — x W sG.i| < lim max \xj — 5?w,sg j| 

TV— >oo iei ' TV— >oo jej 



Pr lim max |x^ — xwsg il < lim max \xj — 2?wsg i 

yw^co iei ' N-+00 jej " 

= £ 



r,A 



lim max^i - x w ,sG,i 

Af->oo iei 



= £ 

For any estimator x, 



lim max \xi — x W sg 

N^ca iei 



r, j4 £ , lim max \x; — irwsG il > lim max la;, — 5?wsg i 
at->oo iei ' at-s-oo je.7 ' 



r,A 



■ 1 





E 


lim x-x oo 

.iV^oo 


r,A e 




= 


E 


lim max \xi — Xi 

N->-co ieiuj 


r,A ( 




> 


E 


lim max \xi — Xi\ 

n^oo iei 


r,A e 




> 


E 


lim max|xi - x w>SG A 

JV^oo iei 


r,A e 


= 


E 


lim ||x-x w ,sc 

.N-yoo 


• II oo 


r 


> J 


ie 





(16) 



(17) 
(18) 

Equation (17) is true because the Wiener filter is optimal for input signals being Gaussian, and equation (18) is 
true because we have shown it in (16). 
We have shown that 

E 



lim llx - 


~ X W,SG oo 


r,A e 


< E 


lim llx - 


-x||oo 


r,A e 


TV^co 








_iV^oo 




. 



It can also be shown that [37] for any e > 0, 

Pr(A e ) > 1 - e 



log 



1-s 
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Then 



lim ||x-x WjSG | 



lim ||x-x w ,sg| 

.Af^oo 



PrLE 

= Pt(e 

+ Py(e 
= 1 ■ Pr (A e ) + Pr (e 
> l-Pr(A e ) 



< E 
r,A e 



lim llx — xl 

AT->00 



lim IIx-x^sgIIoo 

N— foo 



r,A c £ 



< E 
<E 



lim llx — xl 



r,A 



lim llx — xl 



v,A° 



lim ||x-x w ,sg| 

N—t-oo 



r,A c e 



<E 



lim llx — x 

AT->oo 



Pr (A e ) 
Pi (At) 



Pr (A c € 



> 1-e 



log 



1-.S 



Therefore, Xw.sg is asymptotically optimal for £oo norm error with probability 1 when e 



0. 



Appendix B 
Proof of Theorem 2 

The input signal of the scalar Gaussian channels (1) is i.i.d. mixture Gaussian, Xi ~ S\ -jV(0, /Ui) + S2 -Af(0, 112) + 
■ ■ ■ + Sk ■ A/"(0,/xk), and suppose without loss of generality that \i\ = maxke{i.2,....K} Hk- The Wiener filter is 
xw.mg = c • r, and iw.MG.i ~ c ■ ri, where c = /Ui/(/Ui + /j, z ) > is a constant. Let Ik denote the index set where 
xi ~ A/"(0, /ifc), i.e., Ik = {i : Xi ~ A/"(0, /itfc)} for fc <G {1, 2, . . . , if}. Then we define K types of error patterns: 
x-i ~ A/"(0, /Ufc), and the error e^.i = c • r, — x% = c(x, + Zj) — x% = cz-i — (1 — c)Xi ~ A/"(0, c 2 /x z + (1 — c) 2 /Xfc). 
Because the noise variance [i z is a constant, we have 

max (c 2 fi z + (1 - c) 2 /i fe ) = {c 2 ^ z + (1 - c) 2 /xi). 
fee{i,2,...,K} 

Define the event A e as 

A 6 = {|M| = iV(*i + £l ), |AA 2 | = N(s 2 + e 2 ), ...,\M K \= N(s K + e K )}, 

where J2k=i e k = °- 

Again applying equation (14) and following the same procedures in the proof of Theorem 1, we get that 



Pr I maxei j > maxefe i r, A t ) =1, 
\ieii : lei* ' / 

for any k € {2, 3, . . . , if}. Then applying the same derivation as equations (16) and (18), we have 



E 



lim ||x-xw,mg||oo 

N-too 



and 



E 



lim llx — xl 

TV^oo 



r,A e 



r,A t 



E 



> E 



lim max \xi — Swmg ■, 

N->oo ieli 



r,A e 



lim 

N-yoc 



X — Xw.MG I 



r,A f 



for any estimand x. Because for any ei, €2, • • • , £k > [37], 

K 

Pr(i4 £ )>l-5^cfc|log(a fc )l. 



fe=i 
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and finally we get 



Pr IE 



K 



lim ||x-x w ,mg| 

JV->oo 



< E 



lim llx — xll 



> 1 -^2 e k |bg(Sfe)|. 



fc=l 
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